Goto

Collaborating Authors

 Orange County


DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models

Ye, Jiancheng, Bronstein, Sophie, Hai, Jiarui, Hashish, Malak Abu

arXiv.org Artificial Intelligence

ABSTRACT DeepSeek - R1 is a cutting - edge open - source large language model (LLM) developed by DeepSeek, showcasing advanced reasoning capabilities through a hybrid architecture that integrates m ixture of e xperts (MoE), chain of thought (CoT) reasoning, and reinforcement learning. Released under the per missive MIT license, DeepSeek - R1 offers a transparent and cost - effective alternative to proprietary models like GPT - 4o and Claude - 3 Opus; i t excels in structured problem - solving domains such as mathematics, healthcare diagnostics, code generation, and phar maceutical research. Its architecture enables efficient inference while preserving reasoning depth, making it suitable for deployment in resource - constrained settings. However, DeepSeek - R1 also exhibits increased vulnerability to bias, misinformat ion, adversarial manipulation, and safety failures - especially in multilingual and ethically sensitive contexts. Th is survey highlights the model's strengths, including interpretability, scalability, and adaptability, alongside its limitations in general language fluency and safety alignment. Future research priorities include improving bias mitigation, natural language compreh ension, domain - specific validation, and regulatory compliance. Overall, DeepSeek - R1 represents a major advance in open, scalable AI, underscoring the need for collaborative governance to ensure responsible and equitable deployment. INTRODUCTION T he rise of AI and generative models in health and technology Artificial Intelligence (AI) has undergone transformative growth in recent years, profoundly reshaping numerous fields including language processing, automation, and complex decision - making. At its core, AI refers to the simulation of human intelligence by machines, enabling them to perform tasks such as speech recognition, natural lang uage understanding, visual perception, and predictive analytics. One of the recent remarkable advancements in the Generative AI domain is the emergence of DeepSeek - R1, a large language model (LLM) developed by the Chinese company DeepSeek. In benchmarking evaluations, it has demonstrated results competitive with, and in some domains superior to, models like OpenAI's GPT - 4o and GPT - o1 [4] . This has positioned DeepSeek - R1 as a notable advancement not only in LLM capability but also in the global AI development race. DeepSeek - R1: a paradigm shift in LLM development What sets DeepSeek - R1 apart from conventional LLMs is its novel training architecture. This hybrid approach mimics certain aspects of human learning, allowing the model to refine its behavior over time and adapt to mo re complex reasoning tasks.


MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering

Hao, Yuexing, Alhamoud, Kumail, Jeong, Hyewon, Zhang, Haoran, Puri, Isha, Torr, Philip, Schaekermann, Mike, Stern, Ariel D., Ghassemi, Marzyeh

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable performance on various medical question-answering (QA) benchmarks, including standardized medical exams. However, correct answers alone do not ensure correct logic, and models may reach accurate conclusions through flawed processes. In this study, we introduce the MedPAIR (Medical Dataset Comparing Physicians and AI Relevance Estimation and Question Answering) dataset to evaluate how physician trainees and LLMs prioritize relevant information when answering QA questions. We obtain annotations on 1,300 QA pairs from 36 physician trainees, labeling each sentence within the question components for relevance. We compare these relevance estimates to those for LLMs, and further evaluate the impact of these "relevant" subsets on downstream task performance for both physician trainees and LLMs. We find that LLMs are frequently not aligned with the content relevance estimates of physician trainees. After filtering out physician trainee-labeled irrelevant sentences, accuracy improves for both the trainees and the LLMs. All LLM and physician trainee-labeled data are available at: http://medpair.csail.mit.edu/.


Two men are arrested for 'hazardous drone operation' after flying over US airport

Daily Mail - Science & tech

Two people were arrested for allegedly conducting a'hazardous drone operation' near a Massachusetts airport as people in New Jersey demand answers for similar sightings. Robert Duffy, 42, of Charlestown, and Jeremy Folcik, 32, of Bridgewater, were taken into custody Saturday evening after flying an Unmanned Aircraft System (UAS) near Boston's Logan Airport. The incident began at 4.30pm ET when a police officer specializing in real-time crime surveillance detected the UAS, which was smaller than the crafts being reported in New Jersey. 'Leveraging advanced UAS monitoring technology, the Officer identified the drone's location, altitude, flight history, and the operators' position on Long Island,' which is located in the Boston Harbor on the approach to the airport, the department added. Officers were dispatched to that location and found three individuals inside the decommissioned Long Island Health Campus, finding a drone inside a backpack carried by Duffy.


White House accused of flying drone cover up as New Jersey residents vow to shoot them down - live updates

Daily Mail - Science & tech

Reports of mysterious drone sightings in New Jersey have now spread to multiple states, as residents and local officials demand answers from the US Government. Numerous'car-sized' drones have been seen hovering throughout the state since mid-November, sometimes appearing in groups and often remaining in the same place for hours at a time. The first drone sightings appeared over the US Army's Picatinny Arsenal and over President-elect Donald Trump's golf course in Bedminster on November 18. But reports of varying levels of credibility have now spread to at least 12 counties throughout the Garden State, as well as eastern Pennsylvania and Orange County, New York. The FBI and other agencies are investigating, but the Department of Homeland Security said Wednesday: 'We have no more information as to where these drones are coming from, where they're launching from, where they're landing.'


GIS Copilot: Towards an Autonomous GIS Agent for Spatial Analysis

Akinboyewa, Temitope, Li, Zhenlong, Ning, Huan, Lessani, M. Naser

arXiv.org Artificial Intelligence

Recent advancements in Generative AI offer promising capabilities for spatial analysis. Despite their potential, the integration of generative AI with established GIS platforms remains underexplored. In this study, we propose a framework for integrating LLMs directly into existing GIS platforms, using QGIS as an example. Our approach leverages the reasoning and programming capabilities of LLMs to autonomously generate spatial analysis workflows and code through an informed agent that has comprehensive documentation of key GIS tools and parameters. The implementation of this framework resulted in the development of a "GIS Copilot" that allows GIS users to interact with QGIS using natural language commands for spatial analysis. The GIS Copilot was evaluated with over 100 spatial analysis tasks with three complexity levels: basic tasks that require one GIS tool and typically involve one data layer to perform simple operations; intermediate tasks involving multi-step processes with multiple tools, guided by user instructions; and advanced tasks which involve multi-step processes that require multiple tools but not guided by user instructions, necessitating the agent to independently decide on and executes the necessary steps. The evaluation reveals that the GIS Copilot demonstrates strong potential in automating foundational GIS operations, with a high success rate in tool selection and code generation for basic and intermediate tasks, while challenges remain in achieving full autonomy for more complex tasks. This study contributes to the emerging vision of Autonomous GIS, providing a pathway for non-experts to engage with geospatial analysis with minimal prior expertise. While full autonomy is yet to be achieved, the GIS Copilot demonstrates significant potential for simplifying GIS workflows and enhancing decision-making processes.


The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

Slobodkin, Aviv, Goldman, Omer, Caciularu, Avi, Dagan, Ido, Ravfogel, Shauli

arXiv.org Artificial Intelligence

Large language models (LLMs) have been shown to possess impressive capabilities, while also raising crucial concerns about the faithfulness of their responses. A primary issue arising in this context is the management of (un)answerable queries by LLMs, which often results in hallucinatory behavior due to overconfidence. In this paper, we explore the behavior of LLMs when presented with (un)answerable queries. We ask: do models represent the fact that the question is (un)answerable when generating a hallucinatory answer? Our results show strong indications that such models encode the answerability of an input query, with the representation of the first decoded token often being a strong indicator. These findings shed new light on the spatial organization within the latent representations of LLMs, unveiling previously unexplored facets of these models. Moreover, they pave the way for the development of improved decoding techniques with better adherence to factual generation, particularly in scenarios where query (un)answerability is a concern.


How Vocational Education Got a 21st Century Reboot

#artificialintelligence

Erick Trickey is a writer in Boston. For a year, Rodriguez has worked 40-hour weeks as an apprentice test technician, examining IBM mainframes to confirm they work before shipping them to customers. In January, she'll move to a permanent position with a future salary that she says is "definitely much more than I ever thought I'd be making at 19." Rodriguez's opportunities with IBM came to her thanks to her high school, Newburgh Free Academy P-TECH. It's part of an innovative public-school model that combines grade 9-12 education with internships and tuition-free community college. P-TECH, which stands for Pathways in Technology Early College High School, has spread to 10 states and 17 countries since its founding in Brooklyn in 2011. The P-TECH network is growing fast.


4 questions with Rush CIO Dr. Shafiq Rab

#artificialintelligence

Dr. Shafiq Rab, CIO of Rush University Medical Center in Chicago, uses his background in public health to inform his IT vision. Dr. Rab, who completed his medical degree and internal medicine residency at Karachi, Pakistan-based Dow Medical College, had his interest in public health piqued during one of his first physician jobs. While treating an urban squatters settlement in Pakistan, he worked with non-governmental organizations to address the infant mortality rate, mainly by bringing clean drinking water to its residents. "That's how I got involved in healthcare," he says. "And I remain committed to healthcare.


Machine Translation's Past and Future

AITopics Original Links

This article has been reproduced in a new format and may be missing content or contain faulty links. Contact wiredlabs@wired.com to report an issue. The outcome is a halt in federal funding for machine translation R&D. Darpa launches its Spoken Language Systems (SLS) program to develop apps for voice-activated human-machine interaction. Researchers focus on portable systems for face-to-face English-language business negotiations in German and Japanese.